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An emerging body of research is focusing on understanding and building artificial 
systems that can achieve open-ended development influenced by intrinsic motivations. In 
particular, research in robotics and machine learning is yielding systems and algorithms 
with increasing capacity for self-directed learning and autonomy. Traditional software 
architectures and algorithms are being augmented with intrinsic motivations to drive 
cumulative acquisition of knowledge and skills. Intrinsic motivations have recently been 
considered in reinforcement learning, active learning and supervised learning settings 
among others. This paper considers game theory as a novel setting for intrinsic motivation. 
A game theoretic framework for intrinsic motivation is formulated by introducing the 
concept of optimally motivating incentive as a lens through which players perceive a game. 
Transformations of four well-known mixed-motive games are presented to demonstrate 
the perceived games when players' optimally motivating incentive falls in three cases 
corresponding to strong power, affiliation and achievement motivation. We use agent- 
based simulations to demonstrate that players with different optimally motivating incentive 
act differently as a result of their altered perception of the game. We discuss the 
implications of these results both for modeling human behavior and for designing artificial 
agents or robots. 
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INTRODUCTION 

Game theory is the study of strategic decision-making 
(Guillermo, 1995). It has been used to study a variety of 
human and animal behaviors in economics, political science, 
psychology, biology, and other areas. Game theoretic approaches 
have also been utilized in robotics for tasks such as multi-robot 
coordination and optimization (Meng, 2008; Kaminka et al., 
2010) as well as for analyzing and implementing behavior in 
software agents (Parsons and Wooldridge, 2002). This paper 
presents a game theoretic framework for intrinsic motivation and 
considers how motivation might drive cultural learning during 
strategic interactions. The work provides stepping stones toward 
intrinsically motivated, game theoretic approaches to modeling 
strategic interactions. Potential applications include the study of 
human behavior or modeling open-ended development in robots 
or artificial agents. 

In humans, individual differences in the strength of motives 
such as power, achievement and affiliation have been shown to 
have a significant impact on behavior in social dilemma games 
(Terhune, 1968; Kuhlman and Marshello, 1975; Kuhlman and 
Wimberley, 1976; Van Run and Liebrand, 1985) and during other 
kinds of strategic interactions (Atkinson and Litwin, 1960). Some 
models of these phenomena exist for artificial agents (Simkins 
et al., 2010; Merrick and Shafi, 2011), but these models have not 
yet been widely studied for strategic interactions, competition 
and cooperation between artificial agents. 

This paper presents a game theoretic approach to model- 
ing differences in decision-making between individuals caused 



by differences in their perception of the payoff during certain 
strategic interactions. Specifically we consider cases where dif- 
ferences in perception are caused by different motivational pref- 
erences held by individuals. We study strategic decision-making 
in the context of mixed-motive games. Four archetypical two- 
by-two mixed-motive games are considered: prisoner's dilemma 
(PD), leader, chicken, and battle-of-the-sexes (BoS) (Rapoport, 
1967; Colman, 1982). We introduce the concept of optimally 
motivating incentive and demonstrate that agents with different 
optimally motivating incentives perceive the four games differ- 
ently. We show that the perceived games have different Nash 
Equilibrium (NE) points (Nash, 1950) to the original games. This 
causes agents with different optimally motivating incentives to act 
differently. We discuss the implications of these results both for 
modeling human behavior and for designing artificial agents or 
robots with certain behavioral characteristics. 

In the remainder of this Section, section Mixed-Motive Games 
introduces mixed-motive games and section Solution Strategies 
for Mixed-Motive Games reviews relevant existing models of 
strategic decision-making. Section Solution Strategies for Mixed- 
Motive Games also discusses the specific contributions of this 
paper in that context and introduces the background formal 
notations used in the rest of the paper. Section Incentive-Based 
Models of Motivation reviews literature from motivational psy- 
chology about the influence of incentive-based motivation on 
decision-making as inspiration for the new models in sections 
Materials and Methods. Sections Materials and Methods intro- 
duces our new notation for incentives and shows how each of 
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the four mixed-motive games are transformed into various new 
games when different optimally motivating incentives are chosen 
for agent players. Section Results presents a suite of agent-based 
simulations demonstrating that players with different optimally 
motivating incentive act differently as a result of their altered per- 
ception of the game. We conclude in section Discussion with a 
discussion of the implications of the work and future directions it 
may take. 

MIXED-MOTIVE GAMES 

This paper will consider two-player mixed motive games with 
the generic structure shown in Matrix 1. Each player, (Player 1 
and Player 2) has a choice of two actions: C or D. Depending 
on the combination of actions chosen by both players, Player 1 is 
assigned a payoff value Vi and Player 2 is assigned a payoff value 
V%. V\ and V 2 can have values of T, R, P, or S. The value _R is the 
reward if both players choose C. In other words, R is the reward 
for a (C, C) outcome. P is the punishment if both players defect 
[joint D choices leading to a (D, D) outcome] . In a mixed-motive 
game, P must be less than R. T represents the temptation to defect 
(choose action D) from the (C, C) outcome and thus, in a mixed- 
motive game T must be greater than _R. Finally, S is the sucker's 
payoff for choosing C when the other player chooses D. 

Formally, the game G presents players with a payoff matrix: 

[P r" 

G =[SR_ 

The generic game G can be used to define a number of specific 
games by fixing the relationships between T, R, P, and S. Four 
well-known two-by-two mixed motive games and the relation- 
ships that define them are (Colman, 1982): 

1 . Prisoner's Dilemma: T > R > P > S 

2. Leader: T > S > R > P 

3. Chicken: T > R> S> P 

4. Battle of the Sexes: S > T > R > P 

A number of variations of these games do exist (as well as other 
distinct games), but this paper will focus on the four games as 
defined above. 

Matrix 1. A generic two-by-two mixed-motive game G. T 
must be greater than _R and _R must be greater than P. 





Player 2 


Player 1 




D 


C 




D 


PP 


T, S 




C 


S, T 


R,R 



The PD game (Rapoport and Chammah, 1965; Poundstone, 
1992) is perhaps the most well-known of the four games stud- 
ied in this paper. It derives its name from a hypothetical strategic 
interaction in which two people are arrested for involvement in 
a crime. They are held in separate cells and cannot communi- 
cate with each other. The police have insufficient evidence for a 
conviction unless at least one of the prisoners discloses certain 
incriminating information. Each prisoner has a choice between 
concealing information from the police (action C) or disclosing it 



(action D). If both conceal, both with be acquitted and the pay- 
off to both will be V\ = V 2 = R. If both disclose, both will be 
convicted and receive minor punishments: V\ = V2 = P- If only 
one prisoner discloses information he will be acquitted and, in 
addition, receive a reward for his information. In this case, the 
prisoner who conceals information will receive a heavy punish- 
ment. For example if Player 1 discloses and Player 2 conceals, 
the payoffs will be V\ = T and V% = S. Player 2 in this situa- 
tion is sometimes referred to as the "martyr" because he generates 
the highest payoff for the other player and the lowest payoff for 
himself. 

The PD game has been used as a model for arms races, 
voluntary wage restraint, conservation of scarce resources and 
the iconic "tragedy of the commons" (see Colman, 1982, for 
a review). More recently, however, biologists have argued that 
individual variation in motivation and perception means that a 
majority of strategic interactions do not, in fact, conform to the 
PD model (Johnson et al., 2002). The models presented in our 
paper demonstrate one possible explanation for this latter view. 
Specifically, they show how a valid PD matrix can be transformed 
into another game that no longer represents a PD scenario as a 
result of individuals having different motives. 

The game of Leader (Rapoport, 1967) is an analogy for real- 
world interactions such as those between pedestrians or drivers 
in traffic. For example, suppose two pedestrians wish to enter 
a turnstile. Each must decide whether to walk into the turnstile 
first (action D) or concede right of way and wait for the other 
to walk in (action C). If both pedestrians wait, then both will be 
delayed and receive payoffs V% = V2 = R- If they both decide to 
walk first, a socially awkward situation results in the worst payoff 
Vi = V2 = P to both. If one decides to walk and the other waits, 
the "leader" will be able to walk through unimpeded, receiving 
the highest payoff T, while the "follower" will be able to walk 
through afterwards giving the second best payoff S. Other exam- 
ples of real world interactions abstracted by the Leader game 
include two drivers at opposite ends of a narrow, one-lane bridge, 
or two drivers about to merge from two lanes into one. In some 
such real-world situations there are rules of thumb that prevent 
the leader game from emerging, for example flashing headlights 
at a bridge to concede right of way. However, when such commu- 
nication fails or is impossible, individuals' motivations have an 
influential role in decision-making and in how individuals inter- 
pret the scenario. We make the standard assumption that there is 
no communication between agents. 

In the game of Chicken two motorists speed toward each other 
on a collision course. Each has the option of swerving to avoid a 
collision, and thereby showing themselves to be "chicken" (action 
C) or of driving straight ahead (action D). If both players are 
"chicken," each gets a payoff of V\ = V2 = R. If only one player 
is "chicken" and the other drives straight on, then the "chicken" 
loses face and the other player, the "exploiter," wins a prestige 
victory. For example if Player 1 is "chicken" and Player 2 drives, 
the payoffs will be V\ = S and V 2 = T. If both players drive a 
collision will occur and both players will receive the worst pay- 
off V\ = V2 = P- The game of Chicken has also been used to 
model real-world scenarios in national and international poli- 
tics involving bilateral threats, as well as animal conflicts and 
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Darwinian selection of evolutionarily stable strategies (Maynard- 
Smith, 1982). 

Finally, the BoS game can be thought of as modeling a 
predicament between two friends with different interests in enter- 
tainment. Each prefers a certain form of entertainment that is 
different to the other, but both would rather go out together than 
alone. If both opt for their preferred entertainment, leading to 
a (C, C) outcome, then each ends up going alone and receiv- 
ing a payoff of V\ = V 2 = R. A worse outcome (D, D) results 
if both make the sacrifice of going to the entertainments they 
dislike as they both end up alone and V% = V2 = P. If, how- 
ever, one chooses their preferred entertainment and the other 
plays the role of "hero" and makes the sacrifice of attending the 
entertainment they dislike then the outcome is better for both 
of them (either V\ = T and V 2 = S or V x = S and V 2 = T). 
The payoff matrix for BoS is relatively similar to that of Leader, 
with the only difference in the definition being the relation- 
ship between X and S. In Leader T > S, while in BoS S > T. 
This reflects the real-world relationship that is often perceived 
between leadership and sacrifice (Van Knippenberg and Van 
Knippenberg, 2005). We will see in section Results that some 
of the game transformations that are perceived by agents using 
our model of optimally motivating incentive also reflect this 
relationship. 

SOLUTION STRATEGIES FOR MIXED-MOTIVE GAMES 

A strategy a is a function that takes a game as input and out- 
puts an action to perform according to some plan of play. This 
paper will focus on pure strategies, such as "always choose action 
C" and mixed strategies that make a stochastic choice between 
two pure strategies with a fixed frequency. Suppose we denote 
the probability that Player 2 will choose action C as P 2 (C), then 
the expected payoff for the two pure strategies available to Player 
1 ("always play C" or "always play D") can be computed as 
follows: 

E 1 (C)=P 2 (C)R + [1-P 2 (C)]S 
E l (D)=P 2 (C)T+[l-P 2 (C)]P 

Using this information, a player can choose the strategy with the 
maximum expected payoff. A variation on this idea that takes 
into account individual differences in preference is utility the- 
ory (Keeney and Raiffa, 1976; Glimcher, 2011). Utility theory 
acknowledges that the values of different outcomes for different 
people are not necessarily equivalent to their raw payoff values V. 
Formally, a utility function U(V) is a twice differentiable func- 
tion defined for V > 0 which has the properties of non-satiation 
[the first derivative U'(V) > 0] and risk aversion [the second 
derivative U"(V) < 0]. The non-satiation property implies that 
the utility function is monotonic, while the risk aversion property 
implies that it is concave. Utility theories were first proposed in 
the 1700s and have been developed and critiqued in a range of 
fields including economics (Kahneman and Tversky, 1979) and 
game theory (Von Neumann and Morgenstern, 1953). 

Alternatives have also been proposed to model effects that 
are inconsistent to utility theory. Examples include prospect the- 
ory (Kahneman and Tversky, 1979) and lexicographic preferences 



(Fishburn, 1974). The models in this paper can also be thought 
of as an alternative to utility theory that uses theories of moti- 
vation to determine how to compute individuals' preferences. 
Various other techniques have been proposed to model decision- 
making under uncertainty, that is, when it is not possible to assign 
meaningful probabilities to alternative outcomes. Many of these 
techniques capture "rules of thumb" or heuristics used in human 
decision-making (Gigerenzer and Todd, 1999). Examples include 
the maximax, maximin, and regret principles. 

The strategies chosen by players and their corresponding 
payoffs constitute a NE (Nash, 1950) if no player can ben- 
efit by changing their strategy while the other player keeps 
theirs unchanged. This latter definition covers mixed strategies 
M in which players make probabilistic random choices between 
actions. Formally, if we consider a pair of strategies, cri and a 2 , 
and denote the expected payoff for Player 1 using o\ against 
Player 2 using a 2 as £i(cri, a 2 ) , then the two strategies are in 
equilibrium if E\(pi, a 2 ) > Ei(a[, a 2 ) for all a[ ^ a\. In other 
words, the strategies are in equilibrium if there is no alterna- 
tive strategy for Player 1 that would improve Player l's expected 
payoff against Player 2 if Player 2 continues to use strategy a 2 
(Guillermo, 1995). 

Suppose we consider the principles discussed above with refer- 
ence to the four games described in section Mixed-Motive Games. 
In the PD game there is a pure strategy equilibrium point (D, D) 
from which neither player benefits from unilateral deviation, 
although both benefit from joint deviation. We can visualize this 
game in terms of expected payoff as shown in Figure 1 . We denote 
the probability of Player 2 choosing C as P 2 (C), the expected pay- 
off if Player 1 chooses D as -Ei(D), and the expected payoff for 
Player 1 choosing C as £i(C). The visualization shows that the 
definition of PD (T > R > P > S) implies that Ei(D) > Ei(Q 
regardless of P 2 (C). In other words, the strategy of choosing D 
dominates the strategy of choosing C. The NE for this game 
(D, D) is shown circled in Figure 1 . 

In contrast to the PD game, the Leader, Chicken and BoS 
games all have Ei(D) > Ei(Q for P 2 (C) = 1 and£i(D) < Ei(C) 
for P 2 (C) = 0. In other words, these games have two asymmet- 
ric equilibrium points (C, D) and (D, C). However, neither of 
these equilibrium points is strongly stable because the players dis- 
agree about which is preferable. The three games do, however, 
have a mixed-strategy NE, meaning that players will tend to evolve 
strategies that choose C with some fixed probability. We can also 
visualize these games in terms of their expected payoff as shown 
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FIGURE 2 | Visualization of the payoff structures for (A) Leader 

T > S > R > P, (B) Chicken T > R > S > P and (C) Battle of the Sexes 

S> T> R> P. 



in Figure 2. The NE probability of players choosing C is defined 
by the point at which E\ (D) and E\ (C) intersect, i.e.: 

Ei{Q =Bi(D) 
[R - S]P 2 (C) + S = [T — PWiiC) + P 



and likewise for Pi(C). 

Evolutionary game theory (Maynard- Smith, 1982) combines 
classical game theory with learning. Evolutionary dynamics pre- 
dict the equilibrium outcomes of a multi-agent system when 
the individual agents use learning algorithms to choose actions 
in iterative game-play. Two-population replicator dynamics, for 
example, model learning when players may have different strate- 
gies. In this model, suppose we combine the probabilities of Player 
1 playing C and D in a vector form p = \pc, prj] such that pc = 
Pi(C) and pu = Pi(D) and the probabilities of Player 2 playing C 
and D q = [qc, iJd] such that qc = Pi(C) and qo = Pi{D). The 
replicator dynamics in this case are: 

Ap« = p,[(Gq)« - pGq r ] (1) 
Aq t = <?,[(pG r ); - P G T q r ] (2) 

where G is the payoff matrix defined by the game being played. In 
this model, pure strategies tend to dominate over time and mixed- 
strategies are unstable. 

In this paper, we use two-population replicator dynamics to 
model cultural learning (as opposed to biological evolution) 
when mixed-motive games are played iteratively. Borgers and 
Sarin (1997) showed that Cross' learning model for two players 
iteratively playing "habit forming games" converges to asym- 
metric continuous time replicator dynamics. Our approach is a 
stepping-stone toward simulating and analyzing strategic interac- 
tions between agents modeling known motive profiles. 

While classical game theory discussed above offers a wide 
range of insights into behavior in strategic interactions, it is not 
necessarily designed to model human decision-making. In fact, 
there is evidence of humans not conforming to NE strategies in 
many kinds of strategic interaction (Terhune, 1 968; McKelvey and 
Palfrey, 1992; Li et al., 2010). As a result, researchers have started 
to develop alternative approaches. The field of behavioral game 
theory (Camerer, 2003, 2004) is concerned with developing mod- 
els of behavior under assumptions of bounded rationality. These 



models take into account factors such as the heterogeneity of a 
population, the ability of individuals to learn and adapt during 
strategic interactions and the role of emotional and psychological 
factors in strategic decision-making. The purposes of this work 
fall into two broad categories: ( 1 ) to produce computational mod- 
els that can explain and predict human behavior during strategic 
interactions that does not conform to classical game theoretic 
models (Valluri, 2006) and (2) to build artificial systems that can 
exhibit certain desirable behavioral characteristics such as cooper- 
ation or competitiveness (Sandholm and Crites, 1996; Claus and 
Boutilier, 1998; Vassiliades and Christodoulou, 2010), coopera- 
tion during strategic interactions (Valluri, 2006) and improved 
performance against human adversaries who also have bounded 
rationality and limited observation (Pita et al., 2010). The work 
in our paper differs from previous work in this area by its focus 
on the role of motivation in decision-making. 

INCENTIVE-BASED MODELS OF MOTIVATION 

In motivational psychology, incentive is defined as a situational 
characteristic associated with possible satisfaction of a motive 
(Heckhausen and Heckhausen, 2008). A range of incentive-based 
motivation theories exist, dealing with both internal and exter- 
nal incentives. Examples of internal incentives include the novelty, 
difficulty or complexity of a situation. Examples of external incen- 
tives include money and points or "payoff" in a game. For the 
remainder of this paper we define incentive I as a value that is 
proportional to payoff V defined in section Mixed-Motive Games. 
The key aspect of incentive-based motivation to be embedded in 
the game theoretic framework in this paper is that different indi- 
viduals have different intrinsic preferences for incentives. These 
different intrinsic motivations cause individuals to perceive the 
payoff matrix specified by a game differently and act according to 
their own transformation of that matrix. 

The following sub-sections describe three incentive-based 
models of motivation and the different motivational prefer- 
ences they inspire. While we do not explicitly embed these 
models in our proposed game theoretic framework, they inform 
the cases of optimally motivating incentive and correspond- 
ing game transformations that we study in section Materials 
and Methods. The three motives considered are the "influential 
trio" proposed by Heckhausen and Heckhausen (2008): achieve- 
ment, affiliation, and power motivation. These theories are the 
basis of competence-seeking behavior, relationship-building and 
resource-controlling behavior in humans. 

Achievement motivation 

Achievement motivation drives humans to strive for excellence 
by improving on personal and societal standards of performance. 
Perhaps the foremost psychological model of achievement moti- 
vation is Atkinson's Risk-Taking Model (RTM) (Atkinson, 1957). 
It defines achievement motivation in terms of conflicting desires 
to approach success or avoid failure. Six variables are used: 
incentive for success (equated with value of success); probabil- 
ity of success (equated with difficulty); strength of motivation 
to approach success; incentive for avoiding failure; probability of 
failure; and strength of motivation to avoid failure. Success moti- 
vated individuals perceive an inverse linear relationship between 
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incentive and probability of success (Atkinson and Litwin, 1960; 
Atkinson and Raynor, 1974). They tend to favor goals or 
actions with moderate incentives which can be interpreted as 
indicating a moderate probability of success or moderate dif- 
ficulty. We examine the case of success-motivated individuals 
in this paper, by examining the case where individuals with 
a moderate optimally motivating incentive engage in strategic 
interactions. 

Affiliation motivation 

Affiliation refers to a class of social interactions that seek contact 
with formerly unknown or little known individuals and main- 
tain contact with those individuals in a manner that both parties 
experience as satisfying, stimulating and enriching (Heckhausen 
and Heckhausen, 2008). The need for affiliation is activated when 
an individual comes into contact with another unknown or lit- 
tle known individual. While theories of affiliation have not been 
developed mathematically to the extent of the RTM, affiliation 
can be considered from the perspective of incentive and probabil- 
ity of success (Heckhausen and Heckhausen, 2008). In contrast 
to success-motivated individuals, individuals high in affiliation 
motivation may select goals with a higher probability of success 
and/or lower incentive. This often counter-intuitive preference 
can be understood as avoiding public competition and conflict. 
Affiliation motivation is thus an important balance to power 
motivation, but can also lead to individuals with high affilia- 
tion motivation underperforming their achievement motivated 
colleagues. 

Power motivation 

Power can be described as a domain-specific relationship between 
two individuals, characterized by the asymmetric distribu- 
tion of social competence, access to resources or social status 
(Heckhausen and Heckhausen, 2008). Power is manifested by 
unilateral behavioral control and can occur in a number of differ- 
ent ways. Types of power include reward power, coercive power, 
legitimate power, referent power, expert power, and informational 
power. As with affiliation, power motivation can be considered 
with respect to incentive and probability of success. Specifically, 
there is evidence to indicate that the strength of satisfaction of the 
power motive depends solely on incentive and is unaffected by 
the probability of success (McClelland and Watson, 1973). Power 
motivated individuals select high-incentive goals, as achieving 
these goals gives them significant control of the resources and 
reinforcers of others. 

Computational models of achievement, affiliation, and power 
motivation 

Previous work has modeled incentive-based motivation functions 
computationally for agents with power, achievement, and affilia- 
tion motive profiles making one-off decisions (Merrick and Shafi, 
2011). For example, Figure 3 shows a possible computational 
motive profile as a sum of three curves for achievement, affilia- 
tion, and power motivation. Unlike utility functions, motivation 
functions may be non-monotonic and non-concave. The highest 
peak indicates the level of incentive I that produces the strongest 
resultant motivational tendency m(I) for action. Assuming a 



[0, 1] scale for incentive, agents are qualitatively classified as 
power, achievement or affiliation motivated if their optimally 
motivating incentive is high, moderate or low, respectively. 

MATERIALS AND METHODS 

The previous section establishes that individuals can view incen- 
tives differently. Broadly speaking, individuals with strong power, 
achievement, or affiliation may favor high, moderate, and low 
incentives, respectively. In a game theoretic setting this suggests 
that individuals may not play an explicitly described game, but 
rather act in response to their own idiosyncratic payoff matrix. 
This phenomenon is not captured by classical game theory or util- 
ity based models because of the non-monotonic and non-concave 
nature of motivation functions. 

Our approach in this paper brings the idea of a non- 
monotonic intrinsic motivation function to game theory by 
modeling players as having different "optimally motivating incen- 
tives." Optimally motivating incentives are scalar values that rep- 
resent different motive profiles in a compressed form. Formally, 
suppose we have two agents A\ and A2 playing a mixed-motive 
game G. We denote the optimally motivating incentive of A\ as 
J* and the optimally motivating incentive of A2 as I* ■ I* is thus 
the value that maximizes the motivation function m ; (7) of agent 
Aj. This paper is not concerned further with the definition of the 
function m. We focus instead on the game transformations that 
result from introducing I* . 

As we have seen, in a two-by-two game, there are four possible 
outcomes: (C, C), (D, D), (C, D), and (D, C). The incentive val- 
ues for each possible outcome from the perspective of the player 
playing the first listed action are I = R, I = P, I = S, or I = T. 
(See section Mixed-Motive Games and Matrix 1.) Suppose each 
agent Aj wishes to adopt a strategy that results in an outcome that 
minimizes the difference between I and their individual optimally 
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motivating incentive If. That is, each agent wishes to minimize 
| J — If\. This means that agents with different values of If will 
perceive the incentives T, S, R, and P differently. 

We define perceived incentive Ij as a measure of the perceived 
value of a particular incentive I, for a particular agent A;. If we 
further suppose that the maximum perceived incentive must be 
equal to the maximum incentive I max in the original game, then 
we can formalize the notion of perceived incentive Ij as: 

l'j=I^-\I-I*\ 

That is, perceived incentive is equal to maximum incentive minus 
the error between actual and optimal incentive. This means that 
Jmax only has the highest perceived value if it is closest to the 
agent's optimally motivating incentive If . In practice the impli- 
cations are that each incentive I will be perceived differently by 
agents with different optimally motivating incentives If. In addi- 
tion, the highest actual incentive may not be the highest perceived 
incentive for all agents. 

We can now define the perceived incentives T',P',S', and R' of 
each incentive in the original game. In PD, Leader, and Chicken 
the maximum incentive is 7 max = T so we have: 

Tj = T-\T-I*\ Rj=T-\R-If\ 
P>. = T-\P-I*\ sj = T-\S-If\ 

This gives us the perceived game G' in Matrix 2. For BoS the 
maximum incentive is 7 max = S giving: 

S'. = S-\S-I*\ Tj = S-\T-If\ 
R'. = S-\R-IJ\ P j = S-\P-Tf\ 

This produces the perceived game G' in Matrix 3. The next 
sections examine these perceived games when different values 
of If are assumed. We show that the games transform further 
into a series of new games with different NE depending on the 
value of If. There are numerous possible transformations of the 
game, but the remainder of this section focuses in theory on 
three cases of interest corresponding to individuals with strong 
power, achievement, and affiliation motivation. The simulations 
in section Results consider the intermediate cases as well. 



Matrix 2. Perceived game G' for PD, Leader, and Chicken. 





Agent A 2 






D 


c 


Agent A i 


D 


T-\P-I*\,T-\P-I*\ 


T-\T-I*\,T-\S-q\ 




C 


T-\S-q\,T-\T-I*\ 


T- \R-q\,T-\R-I*\ 



Matrix 3. Perceived game G' for Battle of the Sexes. 





Agent A 2 


Agent A i 




D 


c 




D 


S- \P- J*|, S- 


S- |T — 7f |, S — |S-JJ| 




C 


S-\S-I*\,S-\T-I*\ 


s-\R-qus-\R-q\ 



TRANSFORMING PRISONER'S DILEMMA 

Using the PD game as an example, we can now consider how a 
game is transformed into new games, depending on the value of 



If. Three cases are considered corresponding to individuals with 
strong power, achievement, and affiliation motivation. 

Case 1 (Power): The first case examines a range of high opti- 
mally motivating incentives: T > I* > Vi(T + R). We consider 
this range "high" because If is closest to the maximum incen- 
tive T. This gives us the following transformation of the PD game 
using Matrix 2 and simplifying the absolute values using the 
assumption that T > If > Vi(T + R) > R > P > S: 





T — 


(T 


-I*) = 






«;= 


T — 


(If 


-R) = 


T + R 


-T 


1= 


T — 


(If 


-P) = 


T + P 




s ' } = 


T — 


(If 


-S) = 


T + S- 





Theorem 1. For a PD game G with T > R > P > S, when T > 
If >Vi(T + R) the perceived game G' is still a valid PD with Tj > 

Rj>pj>sj. 

Proof. If we assume R 1 - > Tj then we have T + R — I* > If which 
simplifies to Vi(T + R) > If. This contradicts the assumption 
that T > I* > Vi(T + R) so it must be true that T- > R'-. If we 
assume that P'- > R^ then we have T + P-I*>T + R-I* or 
P > R which contradicts the definition of PD. Thus, it must be 
true that R'^ > P'. Likewise, if we assume that S'j > P' then we 
have T + S - If > T + P - If which simplifies to S > P which 
contradicts the definition of PD. Thus, it must be true that 
P' > □ 

Case 2 (Achievement): The second case examines a range of 
moderate optimally motivating incentives: Vi(T + R) > If > R. 
In other words, in this case If is closest to R. This gives us the 
same basic transformation of the PD game as in Case 1 (Equations 
3-6), but now defines a different set of perceived game as follows: 

Theorem 2. For a PD game G with T > R > P > S, when Vi(T + 
R) > If > R the perceived game G has R 1 . > Tj and P. > Sj. 

Proof. If we assume T'. > R'- then we have If > T + R — I* which 
simplifies to If > Vi(T + R). This contradicts the assumption in 
this case that Vi(T + R) > I* so it must be true that Rj > Tj. If 
we assume that Sj > Pj then we have T + S - If > T + P - If 
which simplifies to S > P which contradicts the definition of PD. 
Thus, it must be true that P, > S' □ 

Case 3 (Affiliation): The third case examines a range of low 
optimally motivating incentives: Vi(P + S) > If > S. We con- 
sider this range "low" because If is closest to S. This gives us 
the following transformation of the PD game using Matrix 2 and 
simplifying absolute values: 

Tj = T - (T - If) = If 
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R' 



T- 

: T- 



(R- 
(P- 



■ I*) 
I*) 



T + I* 



T + I* 



R 
P 



(I* - S) = T + S - 



Case 1 (Power): The first case again examines a range of high 
optimally motivating incentives: T > I* > l A(T + S). This gives 
us the same basic transformations in Equations 3-6, and the 
perceived game is still a Leader game. 



Theorem 3. For a PD game G with T > R > P > S, when V 2 (P + Theorem L In a Leader § ame G with T > S > R > P, when T > 
S)>I*>S the perceived game G' has Sj > Pj > Pj > P. % > V < T + ^ the perceived game G is still a valid Leader game 



Proof. If we assume Pj = Sj then we have T + I* — P > T + 
S — I* which simplifies to I* > Vz(P + S). This contradicts the 
assumption that Vi(P + S) > I*. Thus, it must be true that Sj > 
Pj. If we assume Rj > P. then we have T + I* — R > T + I* — P 
which simplifies to P > R. This contradicts the definition of 
PD. Thus, it must be true that Pj > Pj. Likewise, if we assume 
Tj > Rj then we have J* > T + I* - R which simplifies to P > T. 
This contradicts the definition of PD. Thus, it must be true that 

R'. > rj □ 

The three cases above result in a number of different perceived 
games. Case 1 still results in a valid PD game, but in Case 2 and 
Case 3 the perceived games are new games. An example of the 
payoff structure of the new perceived game from Case 2 is visual- 
ized in Figure 4A. In this game E\ (D) > E% (C) for Pi (C) = 0 and 
Ei(D) < Ei(C) forP 2 (C) = l.-Ei(D) andEi(C) intersect at: 



P, > Sj > R'j > P. 

Proof. If we assume S'- > T' then we have T + S — I* > I* which 
simplifies to V2(T + S) > I*. This contradicts the assumption in 
this case that T > I* > Vi(T + S) so it must be true that T- > S',. 
If we assume that P- > Sj then we have T + R — I* > T + S — I* 
which simplifies to R > S which contradicts the definition of 
Leader. Thus, it must be true that S'- > Rj. Likewise, if we assume 
that P > P then we have T + P — I* > T + R — I* which sim- 
plifies to P > R which contradicts the definition of Leader. Thus, 
it must be true that R' } > Pj □ 

Case 2 (Achievement): The second case examines a range of 
moderate-high optimally motivating incentive: Yz(T + S) > I* > 
S. This also gives us the transformations in Equations 3-6, but the 
perceived game is no longer a Leader game. In fact, a number of 
interesting variations occur: 



Pi(Q 



P-s f 



R' 



T' + P 



= M 



There are now two pure NE and the strategy that emerges depends 
on the initial values of Pi (C) and P 2 (C). If Pi (C) + P 2 (C) > 2M 
at t = 0 then the (C, C) equilibrium will emerge. Alternatively if 
Pi(C) + P 2 (C) < 2M at t = 0 then the (D, D) equilibrium will 
emerge. 

In Case 3 the agents also do not perceive a PD game. The per- 
ceived game in this case is visualized in Figure 4B. In this game 
£i(C) > -Ei(D) for all P 2 (C). The (C, C) strategy is now dom- 
inant, indicating that the agents will tend to evolve cooperative 
(C, C) strategies over time. 

TRANSFORMING LEADER 

We can follow the same process to construct perceived versions of 
Leader. 




P 2 (Q 1 



FIGURE 4 | Visualization of the Prisoner's Dilemma game when 
perceived by agents with optimally motivating incentives of (A) 



Vi(T+ R) > K > R and (B) %(P+ S) : 

Equilibria (NE) are circled. 



If > S. The pure strategy Nash 



Lemma 1. In a Leader game G with T > S > R > P, when Vi(T + 
S) > I* > S the perceived game G' has Sj > Tj and Pj > Pj. 

Proof. If we assume Tj > Sj then we have I* > T + S — L* which 
simplifies to I* > Vz{T + S). This contradicts the assumption in 
this case that Vi(T + S) > I* so it must be true that Sj > Tj. If 
we assume that Pj > Pj then we have T + P — I* > T + R — I* 
which simplifies to P > R which contradicts the definition of 
Leader. Thus, it must be true that Pj > Pj □ 

Theorem 2. In a Leader game G with T > S > R > P, when 
Vi(T + S) > I* > S and I* > Vi(T + R) the perceived game G' is 
a BoS game Sj > Tj > Pj > Pj 

Proof. Sj > Tj and Pj > Pj by Lemma 3.2.2. I* > Vi(T + R) 
expands to L* > T + R — L* . Substitution of Equations 3-4 gives 



us Tj > Pj 



□ 



Theorem 3. In a Leader game G with T > S > R > P, when 
ViiT+S) > I* > S and I* < Vi(T + R) the perceived game G' is 



S'-> R'> T f > P.. 
1 J ) J 



Proof. Sj > Tj and Pj > Pj by Lemma 3.2.2. I* < Vi(T + P) 
expands to L* < T + R — I*. Substitution of Equations 3-4 gives 



us T' < R'. 

] J 



□ 



Case 3 (Affiliation): The third case examines a range of low opti- 
mally motivating incentives: V2(P+P) > L* >P. This gives us the 
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following transformation: 



T> = 


T — 


[T 


-1*1 






(7) 


R ',= 


T — 


[R 


-I*] 


= T + I* 


-R 


(8) 


P 'i = 


T- 


U* 


-P] 


= T + P- 




(9) 


S 'i = 


T — 


[S- 


-1*1 


= T + I* 


-S 


(10) 



Theorem 4. In a Leader game G with T > S > R > P, when 
Vi(R + P) > I* > P the perceived game G' is P. >P->3-> T'.. 

Proof. If we assume P- > P we have T + I*-R>T + P- 
I* which simplifies to 7* > \/2(R + P) which contradicts the 
assumption that 1/2 (.R + P) > I*. If we assume S'^ > P- we have 
T + I* - S > T + I* - R or R > S which contradicts the defi- 
nition of Leader. Thus, it must be true that P- > Sy. Likewise 
if we assume T'- > S f } we have I* > T + 1* - S or S > T which 
contradicts the definition of Leader. Thus, it must be true that 



s;>r; 



□ 



TRANSFORMING CHICKEN 

We can follow the same process again to construct the perceived 
versions of Chicken. Proofs are omitted for brevity. 



Case 1 (Power): The first case again assumes a high optimally 
motivating incentive: T > I* > 1/2(T + R). This gives us the 
transformation in Equations 3-6, and the perceived game is a 
Chicken game: 



Theorem 1. For a Chicken game G with T > R > S > P, when 
T > I* > l/2(T + R) the perceived game G is still a valid 
Chicken game V- > P > > 

Proof. Omitted. □ 

Case 2 (Achievement): The second case again assumes a 
moderate-high optimally motivating incentive: l /i(T + R) > 
If > R. This also gives us the transformation in Equations 3-6, 
but the perceived game is no longer a Chicken game: 

Theorem 2. For a Chicken game G with T > R > S > P, when 
Vi(T + R) > I* > R the perceived game G' has R^ > T- and 



Proof. Omitted. 



□ 



Case 3 (Affiliation): The third case again assumes a low opti- 
mally motivating incentive: Vi(S + P) > I* > P. This gives us the 
transformations in Equations 7-10. 

Theorem 3. For a Chicken game G with T > R > S > P, when 
Vi(S + P) > I* > P the perceived game G' is P>- > S- > R'- > T- 



Proof. Omitted. 



TRANSFORMING BATTLE OF THE SEXES 

Finally, we can follow the process above to construct the perceived 
versions of BoS. 

Case 1 (Power): The first case again assumes a high optimally 
motivating incentive: S > I* > l A(T + S). This gives us the fol- 
lowing transformation of the BoS game: 

(11) 
(12) 
(13) 
(14) 

Theorem 1. For a BoS game G with S > T > R > P, when S > 
I* > l A(T + S) the perceived game G' is still a valid BoS game 

S f >T f >R f > P 1 -. 
ill] 



r; 


= s- 


(I* 


-T) 


= S+T- 






= s- 


(J* 


-R) 


= S + R- 






= s- 


(J* 


-P) 


= S + P- 




3 


= s- 


(S- 


-I*) 


= I i 





Proof. Omitted. 



□ 



Case 2 (Achievement): The second case again assumes 
a moderate-high optimally motivating incentive: 
j(T + S) > I* > T. This gives us the transformation of the 
BoS game in Equations 11-14, but the perceived game is no 
longer a BoS. 

Lemma 1. For a BoS game G with S > T > R > P, when Vi{T + 



S) > I* > T the perceived game G' has Tj > and R'j 



Proof. If we assume S'- > T- then we have I* > S + T — I* which 
simplifies to I* > Vi(T + S) which contradicts the assumption 
that Vi(T + S) > I*. Thus, it must be true that > T' If we 



assume P. > R'- then we have S + P - 



I* > S + R — I* which 



simplifies to P > R which contradicts the definition of BoS. Thus, 
it must be true that R'. > P. □ 



Theorem 2. For a BoS game G with S > T > R > P, when Vi(T + 
S) > I* > T and I* > Vi(S + R) the perceived game G' is a 
Leader game T- > S- > R' } > P-. 



□ 



Proof. T- > S': and P > P by Lemma 3.4.2. I* > Vi(S + R) 
expands to I* > S + R — I*. Substitution of Equations 14 and 12 
gives us Sj > R'j □ 

Theorem 3. For a BoS game G with S > T > R > P, when Vi(T + 
S) > I* > T and I* < Vi(S + R) the perceived game G' is a 
Chicken game T. > P- > S- > P. 

Proof. V. > S' } and P- > P. by Lemma 3.4.2. I* < Vi(S + R) 
expands to I* < S + R — I*. Substitution of Equations 14 and 12 
gives us S'j < R'j □ 

Case 3 (Affiliation): The third case again assumes a low opti- 
mally motivating incentive: Vi(R + P) > L* > P. This gives us the 
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FIGURE 5 | Simulation of one hundred pairs of agents playing thirty 
iterations of the Prisoner's Dilemma game. All agents have If = 4.0, but 
initial values of pc and qc are randomized. 



following transformation of the BoS game: 

Tj = S - (X - I*) = S + I* — T 
R' j = S-(R- I*) = S + I* — R 
P- = S-(I* -P) = S + P- I* 
Sj = S-(S-I*)=I* 

Theorem 4. For a BoS game G with S > T > R > P, when l A(R + 
P) > I* > P the perceived game G' is P 1 . >R'-> Tj > g.. 

Proof. If we assume Rj > P 1 - then we have S + 1* - R > S + 
P — I* or I* > ViiR + P) which contradicts the assumption that 
Vi(R + P) > I*. Thus, it must be true that P. > Rj. If we assume 
that Tj > Rj then we have S + 1* - T > S + 1* - R or R > T 
which contradicts the definition of BoS. Thus, it must be true 
that Rj > Tj. Likewise, if we assume that Sj > Tj then we have 
I* > S + I* — T or T > S which contradicts the definition of 
BoS. Thus, it must be true that Tj > Sj □ 

RESULTS 

This section presents simulations of the each of the four games 
studied in section Materials and Methods played by agents with 
optimally motivating incentives conforming to the three cases 
studied, as well as the intermediate cases not studied above. 
We use two-population replicator dynamics to model cultural 
learning when mixed-motive games are played iteratively. We 
demonstrate that individuals with different optimally motivating 
incentives may adopt different strategies when playing a particu- 
lar game, or may learn at different rates. We also discuss how the 
NE of the transformed games reflects a number of results from 
human experiments that are not well-modeled by the NE of the 
original game. 

PRISONERS' DILEMMA 

Figures 5, 6 use the two population replicator dynamics in 
Equations 1 and 2 to simulate one hundred pairs of agents (A% 
and A2) playing the iterated PD (IPD 1 ) game: 



The initial probabilities pc (for agents Ai ) and qc (for agents A2) 
are randomized and the agent pairs learn while playing thirty con- 
secutive games. A range of [1, 4] is assumed for incentive. The 
lines in Figure 5 trace the learned values of pc and qc over time. 
In Figure 5 all agents have a "high" optimally motivating incen- 
tive/* = I2 = 4.0, representing power-motivated individuals. We 
see that the perceived games are identical to the original game, ie: 
G[ = G' 2 = G and all agent pairs tend to converge on the (D, D) 
equilibrium over time. 

In Figure 6 the agents share progressively lower values of I* 
and TJ, ranging from I* = f| = 3.8 in Figure 6A to I* = I* = 
1.0 in Figure 60. Figures 6A,B show Case 1 games in which 
the (D, D) outcome emerges as the equilibrium as predicted by 



Theorem 2.1.1. These agents still perceive a PD game. In contrast, 
Figures 6C,D show Case 2 games in which some agents converge 
on the (C, C) equilibrium and some on the (D, D) equilibrium, as 
predicted by Theorem 2.1.2. The equilibrium approached by the 
agent pairs in this case depends on their initial values of pc and 
qc- In Figures 6E-L the (C, C) outcome becomes more frequent 
as the values of I* and 7| decrease. Figures 6M,N shows Case 3 
games in which all agents converge on the (C, C) equilibrium as 
predicted by Theorem 2.1.3. 

In general, these results support the idea proposed by Johnson 
et al. (2002), that individual variation means that true PD scenar- 
ios occur relatively infrequently in nature. Johnson et al. (2002) 
show that if there is variance in perception of twice the pay- 
off interval in a linear PD game (a game in which the intervals 
between T, R, S, and P are the same) then only 15.8% remain 
valid PD games. Our transformations show that a true PD sce- 
nario will only occur if both agents have optimally motivating 
incentives that fall in the range T > I* > Vi(T + R). If we assume 
I* can only fall within the range of T > I* > S, the fraction v of 
valid PD games will be: 

T - l/2(T + R) T-R 
V ~ T-S ~ 2(T-S) 

In a linear PD game 3(T - R) = (T - S) so v = 1/6 = 16.6% if 
we assume a uniform distribution of optimally motivating incen- 
tives. This is, qualitatively speaking, similar to the result proposed 
by Johnson et al. (2002), and offers support for our methodology 
for modeling differences in motivations. 

Case 1 and Case 2 also provide computational insight into 
some of the findings reported by Terhune (1968). Terhune 
observed pairs of humans classified as either power, affiliation 
and achievement motivated playing single-shot and iterative 
PD games in controlled conditions. One of these experiments 
observed the influence of the first trial outcome on different 
types of people. He found that if the first outcome was (C, C), 
pairs of achievement motivated individuals had the highest sub- 
sequent proportion of (C, C) outcomes (46.8%). In contrast, 
power motivated individuals had (C, C) outcomes only 9.4% 
of the time after a (C, C) outcome on the first trial. In other 
words people with different motives respond differently to the 
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FIGURE 6 | Simulations of one hundred pairs of agents playing thirty 
iterations of the Prisoner's Dilemma game. Agents share different values 
of IJ in each simulation. (A) IJ = 3.8; (B) IJ = 3.6; (C) IJ = 3.4; (D) IJ = 3.2; 



(E) /* = 3.0; (F) IJ = 2.8; (G) IJ = 2.6; (H) IJ = 2.4; (i) IJ = 2.2; (J) IJ = 2.0; 
(K) /* = 1 .8; (L) IJ = 1 .6; (M) /* = 1 .4; (N) /* = 1 .2; (o) IJ = 1 .0. Initial values 
of pc and qc are randomized. See Figure 5 for legend. 



same experience (in this case the first trial outcome). The results 
above suggest that this can be captured computationally using 
our model by using high values of I* for power motivated indi- 
viduals, so that they tend to perceive a Case 1 game and lower 
values of 7* for achievement motivated individuals, so that they 
tend to perceive a Case 2 game. A further discussion of this 
avenue for future work is made in section Human-Computer 
Interaction. 



The Case 3 result is perhaps less instructive from a human 
modeling perspective, but is still useful from an artificial sys- 
tems perspective. If we wish to design agents that will cooperate 
when faced with PD situations, then we can use agents with low 
optimally motivating incentives in the range l A(P + S) > I* > S. 
These agents perceive a game with a dominant (C, C) strategy 
and will thus tend to evolve cooperative strategies over time. 
Likewise, if we wish to model "martyrs" then an agent Ai with 
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V4(P + S) > I* > S will be a martyr (C chooser) when playing 
an agent A 2 with T > i| > Vz{T + -R). This type of personality 
modeling has application to areas such as believable non-player 
characters (NPCs) in computer games. 

LEADER 

If we consider Case 1 (power-motivated) agents playing the leader 
game, we see that Ei(C) > E\(D) for P 2 (C) = 0 and E\(D) > 
Ei (C) for P2(C) = l.Ei (C) and £1 (D) intersect at the point: 



21* + S - P - T - R 

Now, suppose we have two pairs of players. The first pair of players 
have optimally motivating incentives I* = I| = I* ■ The second 
pair of players have optimally motivating incentives 7* = = l£ 
such that I* > l£. Substitution gives us 

S-P S-P 
2I* + S-P-T-R < 2I£ + S-P-T-R 

That is, Pj(C) < Pk(C). In other words the probability of conced- 
ing right of way increases in games between players with weaker 
power motivation, although the equilibria are still at (C, D) 
and (D, C) as indicated by Theorem 2.2.1. This phenomenon is 
evident in the simulations in Figure 7. Figure 7 uses the two pop- 
ulation replicator dynamics in Equations 1 and 2 to simulate one 
hundred pairs of learning agents (A\ and A 2 ) playing the Leader 
game: 



The Case 1 simulations are shown in Figures 7A,B and the trend 
to concede is evident in the progressively less direct paths the 
agent's take to the equilibria. As I* is further decreased in Case 
2 (achievement motivated agents), two types of perceived games 
occur. Either the game is perceived as a BoS game (Theorem 

2.2.3) , or as a game with a dominant (C, C) strategy (Theorem 

2.2.4) . 

The Leader game is perceived as a BoS game when Vi{T + S) > 
I* > S and I* = l A(T + R). The payoff structure for a BoS game 
is visualized in Figure 2C. Figures 7C,D simulates the behavior of 
agents that perceive a Leader game as a BoS game. The paths taken 
to the (C, D) and (D, C) equilibria by these agents are quite indi- 
rect as both are initially motivated to concede right of way by their 
perception of leadership as an act of sacrifice. Leader-follower 
behavior [(C,D) or (D, C)] does emerge, but it does so more 
slowly than for agents with high values of I* because leadership 
is now perceived as an act of sacrifice. 

Figures 7E-J shows simulations of games between agents with 
S > I* > R. These agents perceive games of the form 5y > R'- > 
T 1 , > Pj with dominant (C, C) strategies. As a result, leadership 
behavior does not emerge as an equilibrium as the agents always 
concede right of way. In Case 3 (affiliation motivated agents) there 
are two pure equilibria in the perceived game: (D, D) and (C, C). 



The Case 3 payoff structure is simulated in Figures 7M,N. The 
emergent equilibrium strategy for any pair of agents depends 
on the initial values of P X (C) and P 2 (C). If Pi(C) + P 2 (C) > 
2M at f = 0 then the (C, C) equilibrium will occur over time. 
Alternatively if Pi(C) + P 2 (C) < 2M at t = 0 then the (D, D) 
equilibrium will occur over time. These pure strategy equilibria 
preclude the emergence of leader-follower behavior and result, 
instead, in collisions (both players driving) or procrastination 
(both players conceding right of way). Thus, to achieve leaders 
and followers agents with high values of I* are required. 

CHICKEN 

In the chicken game, Case 1 (power-motivated) agents also per- 
ceive a valid Chicken game resulting in the emergence of an 
"exploiter" agent. However, with a small reduction in T- Case 2 
(achievement motivated) agents perceive a transformed game in 
which the more cautious (C, C) strategy is dominant (Theorem 
2.3.2). This is, in fact, the most common perceived game, covering 
Vz(T + R) > I* > Yz(S + P). This can be thought of as reflecting 
the real-world reluctance to engage in a game of Chicken, which is 
in principle the same as playing and choosing C (Colman, 1982). 

The prevalence of the perceived dominant (C, C) strategy is 
evidenced in the simulations in Figure 8. Figure 8 uses the two 
population replicator dynamics in Equations 1 and 2 to simu- 
late one hundred pairs of learning agents (Ai and Aa) playing the 
Chicken game: 




1 4 

2 3 



Figures 8C-L all show agents approaching the (C, C) equilib- 
rium. One other case does exist (Case 3) in which the perceived 
game has two pure NE: (D, D) and (C, C). The emergent equi- 
librium for two agents depends on the initial values of Pi (C) and 
P 2 (C). If Pi(C) + P 2 (C) > 2M at t = 0 then the (C, C) equilib- 
rium will occur over time. Alternatively if Pi (C) + P 2 (C) < 2M 
at t = 0 then the (D, D) equilibrium will occur over time. These 
pure strategy equilibria result in either certain collision (both 
players driving on) or mutually cautious behavior (both play- 
ers swerving to avoid a collision). Examples of Case 3 agents 
interacting are shown in Figures 7M,N. 

Comparison of Case 1 and Case 3 demonstrates how the same 
outcome may result from different motives. In Case 1 the (D, D) 
outcome results from a preference for high incentives. In Case 3 
the (D, D) outcome results from a preference for low incentives 
to avoid conflict. The strategy clearly backfires, but this sort of 
trend has been observed in a general sense in humans. Individuals 
with high affiliation motivation have been observed to underper- 
form their achievement motivated colleagues precisely because 
their desire to avoid conflict situations often means they also 
miss opportunities to cooperate (Heckhausen and Heckhausen, 
2008). 

BATTLE OF THE SEXES 

If we consider Case 1 (power-motivated) agents playing BoS, we 
see that £i(C) > Ei(D) for P 2 (Q = 0 and Ei(D) > Ei(C) for 
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FIGURE 7 | Simulations of one hundred pairs of agents playing thirty (F) /* = 2.8; (G) /* = 2.6; (H) IJ = 2.4; (I) /* = 2.2; (J) IJ = 2.0; (K) IJ = 1 .8; 

iterations of the Leader game. Agents share different values of /* in each (L) /* = 1 .6; (M) IJ = 1 .4; (N) IJ = 1 .2; (0) IJ = 1 .0. Initial values of p c and 
simulation. (A) IJ = 3.8; (B) IJ = 3.6; (c) IJ = 3.4; (D) IJ = 3.2; (E) IJ = 3.0; q c are randomized. See Figure 5 for legend. 



P 2 (C) = 1. Ei(C) and Ei(D) intersect at the point: 

21* - S - P 

Pi(Q = 

21* -S-P+ T-R 

Now, suppose we have two pairs of learning agents playing 
a BoS game. The first pair of agents has optimally moti- 
vating incentives I* = l\ = I*. The second pair has optimally 



motivating incentives I* = I* = I* such that I* < I*. This 
implies Pj(C) < Pjt(C) as the (T — R) term in the denominator 
becomes increasingly significant as 7* decreases. In other words, 
the probability of choosing C decreases in agents with lower val- 
ues of 7* as they begin to perceive the D choice as a desirable act 
of leadership rather than as a less desirable act of sacrifice. This is 
evident in the simulations in Figure 9. Figure 9 uses the two pop- 
ulation replicator dynamics in Equations 1 and 2 to simulate one 
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FIGURE 8 | Simulations of one hundred pairs of agents playing thirty 
iterations of the Chicken game. Agents share different values of ft in each 
simulation. (A) I* = 3.8; (B) I* = 3.6; (C) I* = 3.4; (D) /* = 3.2; (E) I* = 3.0; 



(F) /* = 2.8; (G) /* = 2.6; (H) /* = 2.4; (I) /* = 2.2; (J) /* = 2.0; (K) If = 1 .8; 
(L) /* = 1 .6; (M) /* = 1 .4; (N) /* = 1 .2; (O) /* = 1 .0. Initial values of p c and 
qc are randomized. See Figure 5 for legend. 



hundred pairs of agents (Ai and A2) playing the BoS game: 




Figures 9A,B show Case 1 simulations while Figures 9C,D show 
Case 2 simulations in which the learning agents perceive a 
Leader game (Theorem 2.4.3) rather than the original BoS game. 



Progressively more direct trajectories towards the (C,D) and 
(D, C) outcomes are evident in these simulations as I* decreases. 

Figures 9E-G show simulations in which the agents perceive 
a Chicken game rather than a BoS game. This is followed by 
another change in perception in Figures 9H,L. In these simula- 
tions, and in the Case 3 games in Figures 9M,N the perceived 
games have two pure NE: (D, D) and (C, C). The strategy chosen 
by the agents depends on the initial values of pc and qc- These 
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FIGURE 9 | Simulations of one hundred pairs of agents playing thirty (E) /* = 3.0; (F) IJ = 2.8; (G) IJ = 2.6; (H) IJ = 2.4; (I) t> = 2.2; (J) IJ = 2.0; 

iterations of the Battle-of-the-Sexes game. Agents share different values (K) IJ = 1 .8; (L) IJ = 1 .6; (M) /* = 1 .4; (N) /* = 1 .2; (0) IJ = 1 .0. Initial values 
of IJ in each simulation. (A) IJ = 3.8; (B) IJ = 3.6; (C) IJ = 3.4; (D) IJ = 3.2; of p c and q c are randomized. See Figure 5 for legend. 



pure strategy equilibria result in both players attending entertain- 
ment alone. For the best outcome to emerge, either a "hero," a 
"leader," or a "chicken" personality is required. 

STRATEGIC INTERACTIONS BETWEEN AGENTS WITH DIFFERENT 
MOTIVES 

The simulations so far consider pairs of agents with the same 
optimally motivating incentives. However, it is also possible to 



simulate the outcomes when pairs of learning agents with dif- 
ferent optimally motivating incentives interact. Figures 10A-D 
simulates such pairs of agents playing each of the four games, PD, 
Leader, Chicken, and BoS, respectively. In each pair, one agent A i 
has a high optimally motivating incentive /* = 3.9 and the other 
A2 has a low optimally motivating incentive 7* = 1.1. 

The results in Figure 10 show that agents with high opti- 
mally motivating incentive tend to be the "exploiters" in PD and 
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FIGURE 10 | Simulations of one hundred pairs of agents playing thirty 
iterations of (A) the Prisoner's Dilemma game; (B) the Leader game; 
(C) the Chicken game; and (D) the Battle-of-the-Sexes game. In each 
simulation, one agent in each pair has /* = 3.9 and the other has /J = 1 .1 . 
Initial values of pc and qc are randomized. See Figure 5 for legend. 



Chicken games, the "leaders" in a Leader game, and the "heroes" 
in a BoS game. In contrast, agents with low optimally motivat- 
ing incentive (less than the average of the lowest two payoffs of a 
game) tend to be the "martyrs" in a PD game, the "followers" in a 
Leader game, the "chickens" in a Chicken game and the "selfish" 
in a BoS game. 

DISCUSSION 

In this paper we have represented agents with an optimally moti- 
vating incentive that influences the way they perceive the pay- 
offs in strategic interactions. By using two-by-two mixed-motive 
games to represent different kinds of strategic interactions, we 
have shown that agents with different optimally motivating incen- 
tives perceive the original game differently. In many cases the 
perceived games have different equilibrium points to the origi- 
nal game. We can draw a number of general conclusions about 
the perceptions of agents with different optimally motivating 
incentives: 

• Agents with high optimally motivating incentive (greater than 
the average of the highest two payoffs of a game) perceive a 
game that still conforms to the conditions defining the original 
game. For example, an agent with high optimally motivating 
incentive playing a PD game will still perceive a valid PD game 
and so on. 

• Agents with moderate or lower optimally motivating incen- 
tive perceive new games that do not conform to the conditions 
defining the original game. This changes the NE and the 
behavior of the agents over time. 

When agents with different optimally motivating incentives 
interact: 

• Agents with high optimally motivating incentive will tend 
to be the "exploiters" in PD and Chicken games, the 



"leaders" in a Leader game, and the "heroes" in a BoS 
game. 

• Agents with low optimally motivating incentive (less than the 
average of the lowest two payoffs of a game) will tend to be the 
"martyrs" in a PD game, the "followers" in a Leader game, the 
"chickens" in a Chicken game and the "selfish" in a BoS game. 

The concept of optimally motivating incentive thus provides an 
approach to building artificial agents with different personalities 
using motivation. Personality in this case is expressed through 
behavior. For example, using the language of Colman (1982), 
agents in the simulations in section Results can be interpreted 
as demonstrating behavioral characteristics such as "aggression," 
"leadership," "heroism," "martyrdom," and "caution." This sug- 
gests a number of possible applications including the design of 
more believable agents, human-computer interaction and sim- 
ulation of human decision-making. These are discussed in the 
following sub-sections. 

BELIEVABLE AGENTS 

Agents with distinguishable personalities have applications in 
areas such as animated entertainment where believable agents 
increase the sense of immersion in a virtual environment. 
According to Loyall ( 1997), believable agents should "allow people 
to not just watch, but also interact with. . . powerful, personality- 
rich characters." The work in this paper specifically explores the 
role of intrinsic motivation for artificial agents engaged in social 
interactions. While the experiments in this paper are abstracted 
to the decision-making level, it is feasible to imagine an extension 
of this work in which this decision making controls the animated 
behaviour of a virtual character. 

Some existing work has studied self-motivated behavior such 
as curiosity and novelty-seeking in NPCs in computer games 
(Merrick and Maher, 2009). Merrick and Maher (2009) demon- 
strate that intrinsically motivated reinforcement learning agents 
can learn in open-ended environments by generating goals in 
response to their experiences. The simulations in this paper 
combined optimally motivating incentive with learning using 
replicator dynamics, to complement the analytical description 
of each game transformation. However, in future it is feasible 
that motive profiles may be combined with learning algorithms 
that learn from actual interaction and experimentation with their 
environment during strategic interactions. Reinforcement learn- 
ing variants such as frequency adjusted Q-learning (Kaisers and 
Tuyls, 20 10) have been specifically developed for such multi-agent 
systems and suggest a starting point for such work. This would 
permit a wider range of motives to be used in NPCs. It would 
also extend existing work with intrinsically motivated NPCs from 
scenarios in which individual agents interact with their environ- 
ment to scenarios in which multiple intrinsically motivated agents 
interact with each other. 

HUMAN-COMPUTER INTERACTION 

Just as the study of computational models of motivation lies 
at the intersection of computer science and cognitive science, 
another area of future work lies at the boundary where com- 
puter and human interact. In particular, computers are increas- 
ingly applied to problems that require them to develop beliefs 
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about the motives and intentions of the humans with whom they 
interact. Maher et al. (2007) for example, propose "curious places" 
in which a building is an "immobile robot" with sensors an actu- 
ators permitting it to monitor and control the built environment. 
The aim of the immobile robot is to intervene proactively on 
behalf of the human and modify the environment in a manner 
that supports the human's goals. In order to do this, it must first 
identify those goals. 

The framework in this paper can be conceived as a foundation 
for agents to simulate and reason about the decision-making of 
other agents or humans. As discussed in section Mixed-Motive 
Games, the four games studied in this paper represent abstrac- 
tions of real-world interaction scenarios. A robot equipped with 
appropriate sensors might monitor the behavior of a given human 
in such scenarios and deduce their motive profile from their 
behavior. By engaging in such "autonomous mental simulation" 
of the intrinsically motivated reasoning of another, such an agent 
may ultimately be better equipped to estimate and support the 
goals of humans. 

SIMULATION OF HUMAN DECISION-MAKING 

The theories presented in this paper provide a starting point 
for developing populations of agents that can reproduce certain 
aspects of human decision-making during strategic interactions. 
Merrick and Shafi (2011) showed that it is possible to calibrate 
power, achievement and affiliation motivated agents such that 



they can accurately simulate human decision-making under cer- 
tain constrained conditions. Specifically, their work focused on 
single-shot decisions by individual agents. The work in this paper 
provides a foundation for extending their work to scenarios in 
which agents interact. In future, such simulations may permit 
us to examine hypotheses about how individuals with different 
motives may behave during strategic interactions. 

Key research challenges in this area include understanding 
the ranges of optimally motivating incentives that best repre- 
sent motivation types such as power, affiliation and achieve- 
ment motivated individuals. In practice it seems that there is 
significant overlap between individuals in the three groups. In 
addition, motivation psychologists have identified hybrid pro- 
files where more than one motive is dominant (Heckhausen 
and Heckhausen, 2008). For example in the leadership profile 
both power and achievement motivation are believed to have 
approximately equal strength. In terms of the work in this paper, 
this would mean that agents have more than one optimally 
motivating incentive. Exploration of profiles such as this is a 
direction for future work that can provide insight into both 
the role of motivation in humans and its modeling in artificial 
systems. 
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